Multi-region electricity demand prediction with ensemble deep neural networks

Electricity consumption prediction plays a vital role in intelligent energy management systems, and it is essential for electricity power supply companies to have accurate short and long-term energy predictions. In this study, a deep-ensembled neural network was used to anticipate hourly power utilization, providing a clear and effective approach for predicting power consumption. The dataset comprises of 13 files, each representing a different region, and ranges from 2004 to 2018, with two columns for the date, time, year and energy expenditure. The data was normalized using minmax scalar, and a deep ensembled (long short-term memory and recurrent neural network) model was used for energy consumption prediction. This proposed model effectively trains long-term dependencies in sequence order and has been assessed using several statistical metrics, including root mean squared error (RMSE), relative root mean squared error (rRMSE), mean absolute bias error (MABE), coefficient of determination (R2), mean bias error (MBE), and mean absolute percentage error (MAPE). Results show that the proposed model performs exceptionally well compared to existing models, indicating its effectiveness in accurately predicting energy consumption.


Introduction
Energy plays a crucial role in maintaining the social, economic, and environmental sustainability of any nation. Over the last decade, there has been a significant global growth in energy consumption, making energy management vital for nations seeking better economic growth and environmental safety [1]. Predictions regarding energy consumption help in the formulation of energy development and design strategies for energy policies [2]. With energy consumption increasing every passing day in emerging and third world countries, and the anticipated growth in world population to reach 9.5 billion by 2030 [3], there is a prediction of improved data accumulation, deep learning algorithms have made complex training easier. In modern times, both deep and shallow learning models have their roles in various fields. The selection of the best machine-learning algorithm for future analysis depends on factors such as data categories, sample size, and model properties. Statistical prediction methods are used for time series forecasting. The prediction of future loads and predictor variables using historical observations is done using load signal and other forecasting tools. Several studies have worked on short-term prediction models, which are for an hour or less and may last up to a week, using linear regression models, multiple regression models, smoothing, and weighted least squares [16,17]. Additionally, machine learning models such as SVM, ANN, and FIS have also been used for predicting electricity consumption in buildings. SVM has been used to predict the electricity consumption of office buildings based on weather data and historical consumption data [18]. ANN has been used to predict electricity consumption in residential buildings using data such as weather, occupancy, and time of day [19]. FIS has been used to predict the energy consumption of a building based on various factors such as temperature, humidity, and occupancy patterns [17]. These models have shown promising results in accurately predicting electricity consumption in buildings, which can help in optimizing energy usage and reducing energy costs.
J. F. Torres and his team used innovative metering technologies to gather multiple time series datasets and developed a multiple forecasting methodology that utilized RNN with LSTM [20]. Their aim was to achieve a day-ahead prediction of electricity consumption in individual households and industries. They used MSE and MAE performance metrics to evaluate the LSTM model's performance in predicting time series data on energy consumption. By using LSTM instead of typical univariate and classical methods, they were able to consume a smaller number of computational resources while still achieving accurate predictions.
In [21], Z. Zhang proposed a technique entitled LSTM and CNN to forecast electricity consumption and ensure the smart grid's suitable operational activities and efficient management. The represented method involves two variates of a sequence, and to obtain the electricity demand forecast, LSTM and ANNS are used to connect a pair of crucial values linked to contextual information, which is then utilized to produce sets. The authors compared the performance of (c,1) LSTM with ARIMA and with sequence to sequence (S2S) LSTM and found that their method offered better accuracy in terms of MAPE and RPSE performance metrics.
M.-W. Li [22] sheds light on the partial electricity demands of the country, keeping an eye on the horizon of 2025 and predicting industrial electricity demand. The authors used a model to predict the electricity demand from 1966 to 2016 and added the following information to make a prediction: the original cost of explanatory variables, the cost of power, its industrial value, and the population of working age. The model shows that the demand for industrial electricity has relatively low long-run elasticities (0.1 and within 0.2-0.3, respectively) for both price and income. Z.A. khan [23] introduced an efficient method for short-term electricity load forecasting, which is essential for effective energy management. The authors propose a hybrid model that combines the advantages of the autoregressive integrated moving average (ARIMA) and support vector regression (SVR) models. The proposed model is tested using real-world electricity load data obtained from the Pakistan Energy Management Company (PEMCO). The results show that the hybrid model outperforms both the ARIMA and SVR models individually in terms of forecasting accuracy.
In [24], an efficient and effective hybrid model is developed for power generation and consumption forecasting, contributing to energy harvesting by providing valuable prediction data to the concerned renewable energy analysts. The model integrates a convolutional neural network with an echo state network to extract meaningful patterns from historical data and learn temporal features for robust renewable energy generation and consumption forecasting. The output spatiotemporal feature vector is then fed to fully connected layers for final forecasting.
In [25], the proposed ESNCNN model combines an Echo State Network (ESN) and a Convolutional Neural Network (CNN) to learn the nonlinear mapping relationship and extract spatial information from renewable energy data. The authors also use residual connections to avoid the vanishing gradient problem and fully connected layers to enhance and select the optimal features for predicting future energy production.
In [26], the proposed framework combines convolutional neural networks (CNN) with a long short-term memory autoencoder (LSTM-AE) to extract both spatial and temporal features from the electricity consumption data. The CNN is used to extract the spatial information, while the LSTM-AE is used to capture the temporal dependencies. The output of the LSTM-AE is fed into fully connected layers for final forecasting. The proposed framework is evaluated on three real-world datasets and compared with state-of-the-art models using various evaluation metrics.
In [27], the authors proposed a deep feed-forward neural network approach to estimate power usage. They utilized the Apache Spark platform for distributed computing and the H2O data analysis framework. Although different electricity consumption prediction studies are useful depending on specific needs, they are limited by time frames and prediction metrics. In this paper, we propose a deep ensemble neural network for the problem of electric consumption prediction. Our method is compared to various baseline methods that use machine learning algorithms and a multi-sequence LSTM model. Thus, the paper has the following contributions to existing literature: • Highlighting the importance of electricity consumption prediction in intelligent energy management systems • Describing the use of a deep-ensembled neural network for predicting power consumption • Outlining the statistical metrics used to assess the performance of the proposed model, including RMSE, rRMSE, MABE, R 2 , MBE, and MAPE • Demonstrating the effectiveness of the proposed model in accurately predicting energy consumption, with results showing superior performance compared to existing models The rest of the paper is organized as follows: Section 2 presents the materials and methods of the proposed work, Section 3 displays the findings of the proposed algorithm on the dataset, Section 4 discusses the purpose and findings of our work, and finally, the paper is concluded in Section 5.

Dataset details
The dataset contains thirteen files and their graphical representations are presented in

System architecture
The proposed methodology is depicted in Fig 3. It begins by choosing the time series data to be normalized. The network parameters are then selected. The network parameters serve as the foundation for the ensemble model. Next, training for the ensemble model is initialized. The error's occurrence is then examined. The ensemble training would be over if the error happened. Additionally, the procedure will be repeated if no errors have occurred. When the statistical measurements are examined in the third phase, feature fusion takes place. The ensemble model is developed and evaluated against previous research. The process will be finished if the findings are satisfactory; the normalization step will be repeated again. Furthermore, the detailed description of the system architecure has been explained in the following Dataset preprocessing. The performance of machine learning algorithms tends to increase with the scaling of numerical input to a standard range. Two types of algorithms are included. The first one is used to measure the distance such as KNN (K-nearest neighbor) and the second one is helpful for weighting sum of the input such as linear regression. Normalization and standardization are the two most widely used methods for the scaling of numeric data before modeling. Each variable is scaled separately to a scope of 0-1, which is the breaking point for movable information where most exactness is in Standardization.
Accordingly, for every variable, the size and measure of data removed from the area might contrast. It's feasible for input factors to have unmistakable scales due to their various units (like feet, kilometers, and hours). The intricacy of the test being demonstrated might be exacerbated by contrasts in scales across input factors. A model might learn huge weight values because of having enormous info values, like a scattering of tens or many units. Enormous weight values show that the model is temperamental, and that implies that it might perform inadequately during learning and be delicate to enter values, prompting a higher speculation error. Scikit library helps in achieving both data normalization and standardization. By rescaling the data from the previous range to a new range between 0 and 1, normalization brings all values into the new range. one must be aware of or be able to precisely calculate the maximum and minimum observable values in order to perform normalization. This is how a value is normalized: Here, output data is scaled in form of Y. Data input by X. column's minimal value is shown as min. a column's maximum value is shown as max. If there are local variables in the dataset, MinMax Scaler scales all of the observed data in the range [-1, 1] otherwise. All of the successive thing sets in the limited reach [0, 0.005] are packed by this scaling. Because of the effect of the irregularities during the calculation of actual mean and standard deviation, Standard Scaler doesn't guarantee adjusted highlight scales in that frame of mind of exceptions. The range of the characteristic qualities consequently decreases. The following are some best practices for using the MinMax Scaler and other scaling techniques: 1. Adapt the scaler by using the classification model provided. Regarding normalization, this means that minimal and maximum accessible supervised learning will be used to calculate values. To do this, use the fit() function. 2. The scale can be applied to the normalization of data. This elaborates that you can train your algorithm using normalized data. Calling the transform() method accomplishes this. 3. The scale can also be used for forwarding data. This implies that you can prepare fresh data in the future for your predictions.
Proposed deep ensemble learning. Different algorithms are used to predict the future energy consumption. However, a single model accuracy may not be considered adequate for the given dataset because every algorithm has limitations, and it is not easy to cope with challenges with just one choice. For this reason, different algorithms are combined to boost the results. Gathering strategies is a machine learning procedure that unites a couple of base models to make one ideal best model. Ensemble learning is a general way to deal with machine learning that looks for better prescient execution by consolidating the prediction from different models.
Although there are an apparently limitless number of ensembles that can be produced for the prescient displaying issue, three strategies dominate the field of ensemble learning. The three principal classes of ensemble learning strategies are boosting, stacking, and bagging. It is essential to both have a detailed comprehension of every technique and to think of them as on predictive displaying project. The objective of any ML issue is to determine solitary methods that will foresee the needed result. Instead of making one model and trusting this model is an excellent precise indicator, we can make ensemble models consider a bunch of methods and normalize those models to deliver one last mode.
Recurrent neural network. A specific type of neural network is recurrent neural networks (RNNs) where the calculations at each step are informed by the outcomes of the previous stage. Inputs and outputs are present in conventional neural networks that are independent of one another. The hidden layer saves data about a succession, and it is the essential and generally critical quality of RNNs as shown in Fig 4. To prepare the dataset utilizing RNN, the info is given to the organization in a solitary time step. The framework's present status is then resolved utilizing the arrangement of current information and the past state. The ongoing time becomes time-1 for the ensuing time step. One can make as many time strides in the past as needs be and blend the information from every one of the old states, contingent upon the issue. After each time step has finished, the result is determined utilizing the last present status. When the result is diverged from the objective result, which is the genuine result, the error is then delivered. To refresh the loads, the network (RNN) is prepared after the error is backpropagated to it [28].
Using sequences to learn, RNN is a deep learning model. The past output is added to the current input by RNN in a recursive manner. RNN's current output uses the past output to learn from the previous sequence. The sequence is given to the current input as second input. As a result, the previous sequences would have an impact on the current output. The past sequences combine to continue the past outcomes. Each data point at time t occurs in reference to the past data in a dataset that flows sequentially. RNN has been used in machine learning to extract the fundamental pattern and meaning from such consecutive information [29].
In specific, a solitary information is applied to a tapped-delay-line memory of s units in an engineering of info yield repetitive model in view of the MLP, and a solitary result is taken care of once more into the information by means of one more tapped-delay-line memory of t units. This two-tapped delay-line memory provides data to the MLP's input layer. In other words, the output precedes the input by one temporal unit. u (n) stands for the current value of the model info, and y (n+1) represents the equivalent value of the model result. To reflect exogenous information sources that came from outside the network, the sign vector provided to the information layer of the MLP hence comprises of an information window comprised of over a wide span of time upsides of the plant inputs. This is because of the way that the model results are relapsed on their postponed values. The nonlinear auto backward with exogenous information sources (NARX) model is the name of this repetitive network [30].
Long short term memory. Predictive modeling challenges involving time series prediction are complex. Time series, in contrast to regression predictive modeling, also increases complexity by introducing a relationship between the input variables' sequences. Recurrent neural networks are a potent class of neural networks built to manage sequence dependence. Because very large designs may be taught, the Long Short-Term Memory network, also referred to as the LSTM network, is a type of RNN used in machine learning. Using backpropagation across time, it was trained. As a result, it is used to build substantial recurrent networks, which can then be utilized to tackle challenging sequence issues in machine learning and produce cutting-edge outcomes. Memory blocks in LSTM networks are connected by layers as opposed to neurons [31].
It is easy to understand the temporal relation between sequences with the LSTM functionality as depicted in Fig 5. Exploding and vanishing gradient issues are resolved by its internal memory unit and gate mechanism, which are issues with typical RNN training. Input gate, output gate, forget gate, and cell status are the four significant units that make up the LSTM model's internal structure. These three gates are responsible for the Maintenance and updating of information in cell status. Following Eqs (2)-(7) show the computational process rt ¼ sðpw½pt À 1; yt� þ brÞ ð3Þ wt ¼ tanhðpr½pt À 1; yt� þ bwÞ ð5Þ σ is called the sigmoid activation function. The forget, input, and output gates values are denoted by the notations qt, rt, and st, respectively. The memory cell is referred to as ut and updating and activating the present cell status is at. The output vector result at time t is represented by vectors vt and yt, respectively. pq,r,w,s and bq,r,w,s are the bias vectors and the weights matrices, respectively [32,33]. Network Parameters of LSTM: Trainable weights also known as trainable parameters are used to identify the complexity of the network. The layers of the network such as input layer, output layer and hidden layer exemplify the trainable

PLOS ONE
Energy forecasting weights into the structure and internal connections of LSTM. The total number of tranable weights can be calculated as: Where, TWS = Number of trainable weights, i = inputs, o = output, L = LSTM cells in the hidden layers. The difference between the poor and good performance means the selection of finest parameters for the neural network architecture.

Results
This paper proposed a method to predict energy consumption of thirteen regions. The method utilized the publicly available dataset of twelve region on Kaggle and one region (Najran) data from https://data.gov.sa/Data/en/dataset. Following libraries have been used for energy prediction: Numpy array, pandas, tensorflow, matplotlib, sequential model, from keras layers: dense, RNN, LSTM, and dropout.

Model evaluation
The following Regression Metrics: root mean squared error (RMSE), relative root mean squared error (RMSE), mean absolute bias error (MABE), coefficient of determination (R 2 ), mean bias error (MBE) and mean absolute percentage error (MAPE) help in evaluating machine Learning models. Statistical equations of the regression metrics have been shown in Eqs (9)- (14) where uw represents the acutal value, nw is the predicted value, and K represent number of observations.
RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 k

PLOS ONE
Energy forecasting energy consumption for the PJMW region is shown in Fig 12, in which 30,000 testing hours are captured to validate the proposed model's performance. Fig 13 discusses the energy consumption for the COMED region that addresses the prediction of 14,000 testing hours. The energy consumption for the DEOK region is shown in Fig 14, where the projection of actual and predicted values of 14,000 hours are displayed. The energy usage for the DUQ region is covered in Fig 15. For DUQ, 14,000 hours are used for testing the proposed model. Fig 16 displays the DOM region's energy usage. It depicts the consumption of 22,000 hours to verify the accuracy of the proposed model. Fig 17 shows how much energy is used in the PJML area. In this region, 6000 testing hours are considered for analyzing system performance. In Fig 18, the finding for the Najran region is displayed. For Najran, 20,000 testing hours are utilized to analyze the proposed model's prediction accuracy.
In terms of R 2 values, the proposed model achieves higher in the PJME region and lower in the PJM_Load region. In the case of MAPE values, the proposed model reaches a higher error value in PJM_Load and lower in the DAYTON region. In terms of MBE values, the proposed model attains higher in the EKPC region and lower in the PJME region. Regarding RMSE values, the proposed model is secured higher in the EKPC region and lower in the COMED region. Regarding rRMSE values, the proposed model obtains higher in PJM_Load region and lower in the PJME region. In terms of MABE values, the proposed model gains higher in PJME_Load region and lower in the PJME region.
The proposed study analyzes the data of 13 regions and exhibits the values of R 2 , MAPE, MBE, RMSE, rRMSE, and MABE as shown in Table 2. RNN and LSTM models were ensembled to get the results of these values. The average accuracy rate for R 2 is 0.9731. while MAPE gives an average of 1.9238, MBE offers 0.0281, RMSE gives 0.1658, rRMSE exhibits 0.0326, and MABE gives a value of 0.1809. A comparison experiment is designed between existing research studies and proposed work, showing that the accuracy rate is higher for a given energy consumption dataset. The main focus is to come by an improved outcome at the results. As Table 3 shows [34], has excellent accuracy rates, but it only offers the potential gains of MAPE and RMSE.
Similarly, in [35], inspects the potential gains of MAPE. It's values help to get accuracy rates, yet it gives no direction about other indispensable factors. In [36], shows the potential gains of R 2 and RMSE [37], looks at R 2 , MAPE, and RMSE, while [38] shows MAPE and RMSE. This vast number of studies has proposed a savvy thought for deciding power use for the future, yet the fundamental drawback is the shortfall of critical components and recipes.

Discussion
The review proposed an ensemble model for gauging power utilization in the hourly time frame. The proposed model plans to convey the best outcomes in forecasting energy  consumption. The ensemble approach utilized in this research makes the precision rate higher. A solitary calculation can be utilized to foresee the exact pace of the given dataset, yet it isn't dependable in some cases. Gathering at least two models can offer far and away superior outcomes and increment the general accuracy rate. The study makes a grouping of algorithms (RNN and LSTM), the initial load might vary however the plan for these models is kept something very similar. The methodology is applied to the given dataset to make veritable determining. The field of group learning is generally around mulled over and there are various minor takeoffs from this direct point. RNN is utilized in this concentrate because of its outperformance in estimating the utilization of energy. RNN is the most ideal decision for dealing with a gathering of comparable data. RNNs apply loads to the current and the previous data. Moreover, a tedious neural network will similarly change the heaps for both the tendency to dive and backpropagation through time. The given dataset contains the previous data of quite a while (various periods of time for every region). This data is useful for RNN to make predictions about the impending years. Alongside RNN, LSTM is utilized in this research to make the outcomes more dependable and bona fide.
The mix of LSTM and RNN guarantees a higher exactness rate for gauging future utilization. LSTM is a momentous sort of RNN, which shows excellent execution on a tremendous assortment of issues. LSTM networks track down significant applications in the going with areas: Language showing, Machine translation, penmanship affirmation, Picture captioning, Picture age using thought models, and Question tending. It is different RNNs that are great for learning extended-length conditions, particularly in get-together deciding issues. LSTM has discussion affiliations, i.e., it is ready for dealing with the whole movement of data, other than single server farms like pictures.
A correlation exploration is held between existing research and proposed work which clearly shows the precision rate is higher for a given dataset of power consumption. The research analyzes precision pace of R 2 , MAPE, MBE, RMSE, rRmse, and MABE. The inspiration is to get a better result at the outcomes. As the table shows [34], has impressive precision rates yet it just offers the upsides of MAPE and RMSE. Anticipating the precision of one more significant calculation in its research is troublesome.
Likewise, in [35] authors examine the upsides of MAPE only. MAPE values are useful to get precision rates, yet it provides no guidance about other vital variables. Some other studies are likewise audited for a similar reason, for example [36], shows the upsides of R 2 and RMSE [37], examines R 2 , MAPE, and RMSE, while [38] shows MAPE and RMSE. This large number of studies have proposed a smart thought for determining power utilization for the future, yet the main disadvantage is the absence of significant elements and recipes. This prompts the point that the results show that both LSTM and RNN offer the best results with higher accuracy and the blunder rate is extremely low. Moreover, LSTM and RNN furthermore assisted in choosing most minor blunders conversely, with various models, for instance, ARIMA and SARIMA which didn't offer satisfactory results.

Conclusion
This study uses LSTM and RNN because of their capacity to manage the consecutive information and extraordinary attributes of keeping up with transient connections in long haul. The proposed model aides in deciding different forecasting models for energy consumption. A comparison experiment is held between previously existing studies and proposed work which obviously shows the accuracy rate is higher for given dataset of power consumption. The dataset is partitioned into 13 distinct regions and every locale shows the hourly energy consumption for a particular time frame length. The model is prepared to manage the dataset in time series grouping. When the model is prepared and optimal values are determined, the best LSTM and RNN network is applied to the information to foresee hourly utilization of energy. The forecasting is made for the next years in the dataset. The outcomes show that ensembled LSTM and RNN offer the best outcomes with higher precision and the error rate is very low. Besides, LSTM and RNN additionally helped in deciding most minor errors as contrast with different models, for example, ARIMA and SARIMA which didn't offer fulfilled results. The future motivation is to create hybrid models with much higher precision and higher paces. The work can be additionally improved by adding records of a few other helpful variables to cover more urgent perspectives. The proposed stud provides a clear guidance to forecasting energy consumption by outflanking different models and can be utilized as a most ideal decision for future expectations.